ood detection performance
Supplementary Material AEvaluation on CIFARBenchmarks
Setup We additionally evaluate GradNorm on a common benchmark with CIFAR-10 and CIFAR100 [22] as ID datasets, which is routinely used in literature [13, 27, 14, 29, 26]. We use the standard split with 50,000 training images and 10,000 test images. The learning rate is initially 0.1, and decays by a factor of 10 at epochs 50, 75 and 90 respectively. Results We summarize the results in Table 6, where GradNormremains competitive. In particular, GradNorm reduces the average FPR95 by 8.77% on CIFAR-10 compared to the best baseline.
The Best of Both Worlds: On the Dilemma of Out-of-distribution Detection
Out-of-distribution (OOD) detection is essential for model trustworthiness which aims to sensitively identity semantic OOD samples and robustly generalize for covariate-shifted OOD samples. However, we discover that the superior OOD detection performance of state-of-the-art methods is achieved by secretly sacrificing the OOD generalization ability. The classification accuracy frequently collapses catastrophically when even slight noise is encountered. Such a phenomenon violates the motivation of trustworthiness and significantly limits the model's deployment in the real world. What is the hidden reason behind such a limitation?